-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug fix: compute_reward for batch input #153
Conversation
Hi @nicehiro! Can you give me a bit more detail on what type of bug this is? Some code example. Also, can you run |
gymnasium_robotics/envs/maze/maze.py
Outdated
elif self.reward_type == "sparse": | ||
return 1.0 if np.linalg.norm(achieved_goal - desired_goal) <= 0.45 else 0.0 | ||
return - (d > 0.45).astype(np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you changing the reward function? With your implementation it returns -1 if the goal is not reached and 0 otherwise.
Hi, @rodrigodelazcano, Thanks for your great work!
We expect to get list of reward for each observation passed to
-1 is more convenient for training in sparse reward environment. Because 0 has no grad generate. By the way, all the changes refer to design here. Here's a simple example:
Before:
After
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicehiro The suggested reward change makes sense but as we are trying to preserve at unmaintained work, we are going to keep the reward function as is.
If you want to use the alternative reward function, I would recommend using a reward wrapper to modify the value.
Could you allow add a test that confirms the function works as expected with a single and multiple rewards
Otherwise, looks good to merge
@nicehiro Could you add the multi-fetch environment in a different PR |
Yes. Sorry about that.
Yes. Is it ok I add a test in |
Yes, I think makes sense |
elif self.reward_type == "sparse": | ||
return 1.0 if np.linalg.norm(achieved_goal - desired_goal) <= 0.45 else 0.0 | ||
return -(d > 0.45).astype(np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp64 not fp32
I'm merging this. Thank you @nicehiro and for the reviews @pseudo-rnd-thoughts and @Kallinteris-Andreas |
@rodrigodelazcano changing the reward function, would require a new revision Also, my suggested changes have not been applied, revert? |
This reverts commit 65bfa85.
@Kallinteris-Andreas I think we should revert the reward function change but the batch input shouldn't require a change of version |
Sorry @pseudo-rnd-thoughts and @Kallinteris-Andreas . I thought the changes were made. Can you make a PR with the old reward function? |
also, i have no idea what |
Yeah, we need testing for this as well |
Basically this is used to compute vector norms along a specific axis of a matrix, instead of an int norm of the full matrix. In this case the last axis. Since the fix is for computing rewards for batch inputs of achieved and desired goal this addition is required. Am I missing anything @nicehiro ? |
Yes. This should work for single input and batch inputs both.
I’ve tested it long times ago. Sorry that I don’t have a laptop to test it
right now.
Rodrigo de Lazcano ***@***.***>于2023年6月21日 周三22:59写道:
… also, i have no idea what axis=-1 does
Basically this is used to compute vector norms along a specific axis of a
matrix, instead of an int norm of the full matrix. In this case the last
axis.
Since the fix is for computing rewards for batch inputs of achieved and
desired goal this addition is required.
Am I missing anything @nicehiro <https://github.com/nicehiro> ?
—
Reply to this email directly, view it on GitHub
<#153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEHTIQWMCA5NMXALEXVAH6LXMMD3PANCNFSM6AAAAAAYHH557E>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
To clarify I know what |
|
Description
Fix
compute_reward
function can not compute reward for batch input inAntMaze
environment.Please delete options that are not relevant.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)